Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity
نویسندگان
چکیده
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands).
منابع مشابه
Towards Deep Learning in Hindi NER: An approach to tackle the Labelled Data Sparsity
In this paper we describe an end to end Neural Model for Named Entity Recognition (NER) which is based on BiDirectional RNN-LSTM. Almost all NER systems for Hindi use Language Specific features and handcrafted rules with gazetteers. Our model is language independent and uses no domain specific features or any handcrafted rules. Our models rely on semantic information in the form of word vectors...
متن کاملAutomatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers
Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNERTC) dataset is a collection of automatically categorized and annotated sentences obtained from Wikipedia. We constructed large-scale gazetteers by using a graph crawler algorithm to extract relevant entity and domain information from a semantic knowledge base, Freebase1. The constructed gazetteers contains approximately 30...
متن کاملLearning a Named Entity Tagger from Gazetteers with the Partial Perceptro
While gazetteers can be used to perform named entity recognition through lookup-based methods, ambiguity and incomplete gazetteers lead to relatively low recall. A sequence model which uses more general features can achieve higher recall while maintaining reasonable precision, but typically requires expensive annotated training data. To circumvent the need for such training data, we bootstrap t...
متن کاملLearning a Named Entity Tagger from Gazetteers with the Partial Perceptron
While gazetteers can be used to perform named entity recognition through lookup-based methods, ambiguity and incomplete gazetteers lead to relatively low recall. A sequence model which uses more general features can achieve higher recall while maintaining reasonable precision, but typically requires expensive annotated training data. To circumvent the need for such training data, we bootstrap t...
متن کاملTrained Named Entity Recognition using Distributional Clusters
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recognition. The default feature set of BWI is augmented with features based on distributional term clusters induced from a large unlabeled text corpus. Using no traditional linguistic resources, such as syntactic tags or speci...
متن کامل